Active Learning from Weak and Strong Labelers

نویسندگان

  • Chicheng Zhang
  • Kamalika Chaudhuri
چکیده

An active learner is given a hypothesis class, a large set of unlabeled examples and the ability to interactively query labels to an oracle of a subset of these examples; the goal of the learner is to learn a hypothesis in the class that fits the data well by making as few label queries as possible. This work addresses active learning with labels obtained from strong and weak labelers, where in addition to the standard active learning setting, we have an extra weak labeler which may occasionally provide incorrect labels. An example is learning to classify medical images where either expensive labels may be obtained from a physician (oracle or strong labeler), or cheaper but occasionally incorrect labels may be obtained from a medical resident (weak labeler). Our goal is to learn a classifier with low error on data labeled by the oracle, while using the weak labeler to reduce the number of label queries made to this labeler. We provide an active learning algorithm for this setting, establish its statistical consistency, and analyze its label complexity to characterize when it can provide label savings over using the strong labeler alone.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Knowledge Transfer for Multi-labeler Active Learning

In this paper, we address multi-labeler active learning, where data labels can be acquired from multiple labelers with various levels of expertise. Because obtaining labels for data instances can be very costly and time-consuming, it is highly desirable to model each labeler’s expertise and only to query an instance’s label from the labeler with the best expertise. However, in an active learnin...

متن کامل

Cost-Effective Active Learning from Diverse Labelers

In traditional active learning, there is only one labeler that always returns the ground truth of queried labels. However, in many applications, multiple labelers are available to offer diverse qualities of labeling with different costs. In this paper, we perform active selection on both instances and labelers, aiming to improve the classification model most with the lowest cost. While the cost...

متن کامل

Active Learning for Crowdsourcing Using Knowledge Transfer

This paper studies the active learning problem in crowdsourcing settings, where multiple imperfect annotators with varying levels of expertise are available for labeling the data in a given task. Annotations collected from these labelers may be noisy and unreliable, and the quality of labeled data needs to be maintained for data mining tasks. Previous solutions have attempted to estimate indivi...

متن کامل

Efficient PAC Learning from the Crowd

In recent years crowdsourcing has become the method of choice for gathering labeled training data for learning algorithms. Standard approaches to crowdsourcing view the process of acquiring labeled data separately from the process of learning a classifier from the gathered data. This can give rise to computational and statistical challenges. For example, in most cases there are no known computa...

متن کامل

Who Should Label What? Instance Allocation in Multiple Expert Active Learning

The active learning (AL) framework is an increasingly popular strategy for reducing the amount of human labeling effort required to induce a predictive model. Most work in AL has assumed that a single, infallible oracle provides labels requested by the learner at a fixed cost. However, real-world applications suitable for AL often include multiple domain experts who provide labels of varying co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015